Uniqueness of Non-Gaussian Subspace Analysis
نویسندگان
چکیده
Dimension reduction provides an important tool for preprocessing large scale data sets. A possible model for dimension reduction is realized by projecting onto the non-Gaussian part of a given multivariate recording. We prove that the subspaces of such a projection are unique given that the Gaussian subspace is of maximal dimension. This result therefore guarantees that projection algorithms uniquely recover the underlying lower dimensional data signals. An important open problem in signal processing is the task of efficient dimension reduction, i.e. the search for meaningful signals within a higher dimensional data set. Classical techniques such as principal component analysis hereby define ‘meaningful’ using second-order statistics (maximal variance), which may often be inadequate for signal detection, e.g. in the presence of strong noise. This contrasts to higher order models including projection pursuit [1, 2] or non-Gaussian subspace analysis (NGSA) [3, 4]. While the former extracts a single non-Gaussian independent component from the data set, the latter tries to detect a whole non-Gaussian subspace within the data, and no assumption of independence within the subspace is made. The goal of linear dimension reduction can be defined as the search of a projection W ∈ Mat(n×d) of a d-dimensional random vector X with n < d and WX bearing still as much information of X as possible. Of course this last term has to be specified in detail in terms of some distance or source model. This problem describes a special case of the larger field of model selection [1], an important tool for preprocessing and dimension reduction, used in a wide range of applications. In the following we will use the notations Gl(n) and O(n) to denote the group of invertible and orthogonal n × n-matrices respectively. Upper-case symbols are used for both matrices and random vectors, lower case ones for scalars and vectors. Matlabnotation is employed for selecting columns and rows of matrices, so for example A(2 : n, : ) denotes the matrix consisting of the last (n − 1)-columns of A ∈ Mat(n × n). Random variables and vectors are defined on the probability space Ω, and the notation X ∈ L2(Ω,R) means that the random variable X is square-integrable. Finally, we are only treating the real case here, although extensions to complex-valued random vectors along the lines of [5] are possible. Instead of relying on second-order statistics only, higher-order statistics are used in NGSA in order to determine ‘interesting’ directions [ 4 , 3 ]. The goal is to find a projection with maximal non-Gaussianity, removing the Gaussian part of X. In other words, the goal is to find a projection WN ∈ Mat(n × d) such that there exists WG ∈ Mat((d − n) × d) with WNX and WGX being independent, and WGX being Gaussian. J. Rosca et al. (Eds.): ICA 2006, LNCS 3889, pp. 917–925, 2006. c © Springer-Verlag Berlin Heidelberg 2006 918 F.J. Theis and M. Kawanabe An intuitive notion of how to choose the reduced dimension n is to require that WGX is maximally Gaussian, and hence WNX non-Gaussian. The dimension reduction problem itself can of course also be formulated within a generative model, which leads to the following linear mixing model
منابع مشابه
Saliency - Guided Graphics and Visualization
Title of dissertation: SALIENCY-GUIDED GRAPHICS AND VISUALIZATION Youngmin Kim, Doctor of Philosophy, 2008 Dissertation directed by: Professor Amitabh Varshney Department of Computer Science In this dissertation, we show how we can use principles of saliency to enhance depiction, manage visual attention, and increase interactivity for 3D graphics and visualization. Current mesh saliency approac...
متن کاملNon-Gaussian Component Analysis with Log-Density Gradient Estimation
Non-Gaussian component analysis (NGCA) is aimed at identifying a linear subspace such that the projected data follows a nonGaussian distribution. In this paper, we propose a novel NGCA algorithm based on logdensity gradient estimation. Unlike existing methods, the proposed NGCA algorithm identifies the linear subspace by using the eigenvalue decomposition without any iterative procedures, and t...
متن کاملJoint low-rank approximation for extracting non-Gaussian subspaces
In this article, we consider high-dimensional data which contains a low-dimensional non-Gaussian structure contaminated with Gaussian noise. Motivated by the joint diagonalization algorithms, we propose a linear dimension reduction procedure called joint low-dimensional approximation (JLA) to identify the non-Gaussian subspace. The method uses matrices whose non-zero eigen spaces coincide with ...
متن کاملBayesian Analysis of Censored Spatial Data Based on a Non-Gaussian Model
Abstract: In this paper, we suggest using a skew Gaussian-log Gaussian model for the analysis of spatial censored data from a Bayesian point of view. This approach furnishes an extension of the skew log Gaussian model to accommodate to both skewness and heavy tails and also censored data. All of the characteristics mentioned are three pervasive features of spatial data. We utilize data augme...
متن کاملUniqueness results for the generators of the two - dimensional Euler and Navier – Stokes flows . The case of Gaussian invariant measures
The Euler and Navier–Stokes equations for an incompressible fluid in two dimensions with periodic boundary conditions are considered. Concerning the Euler equation, previous works analyzed the associated (first order) Liouville operator L as a symmetric linear operator in a Hilbert space L2( ) with respect to a natural invariant Gaussian measure (given by the enstrophy), with domain the subspac...
متن کامل